-
Notifications
You must be signed in to change notification settings - Fork 59
clickhouse: prevent replicated tables from starting in read-only mode. #9183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d84e49b
to
11b96f2
Compare
<!-- Disable sparse column serialization, which we expect to not need --> | ||
<ratio_of_defaults_for_sparse_serialization>1.0</ratio_of_defaults_for_sparse_serialization> | ||
<!-- Prevent ClickHouse from setting distributed tables to read-only. --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc nit, this is setting the local replica to read-only. I.e., it avoids updating the local state with the shared state in the Raft cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. As far as testing, I think we should figure out how to check if this is being used on the Dogfood rack. That is, can we see in the logs that it would have set the table to read-only mode, but is not because of this setting?
Dogfood is the only place we run the replicated cluster.
Just a nit here, it's so that ClickHouse always accepts the shared state, not the local state. |
On start, ClickHouse compares the local state of each distributed table to its distributed state. If it finds a discrepancy, it starts the table in read-only mode. When this happens, oximeter can't write new records to the relevant table(s). In the past, we've worked around this by manually instructing ClickHouse using the `force_restore_data` sentinel file, but this requires manual detection and intervention each time a table starts up in read-only mode. This patch sets the `replicated_max_ratio_of_wrong_parts` flag to 1.0 so that ClickHouse always accepts shared state, and never starts tables in read-only mode. As described in ClickHouse/ClickHouse#66527, this appears to be a bug, or at least an ergonomic flaw, in ClickHouse. One replica of a table can routinely fall behind the others, e.g. due to restart or network partition, and shouldn't require manual intervention to start back up. Part of #8595.
11b96f2
to
3b34575
Compare
I think we can find some interesting log lines at https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/StorageReplicatedMergeTree.cpp#L1959-L1984. When we find mismatched parts and switch a table to read-only mode, we get a log like this:
When we don't throw an exception, we log a warning like this:
After rolling out this change, I think we should expect to see the latter log line in Anyhow, I'm going to merge this PR, and we'll see what happens after the next deploy. |
On start, ClickHouse compares the local state of each distributed table to its distributed state. If it finds a discrepancy, it starts the table in read-only mode. When this happens, oximeter can't write new records to the relevant table(s). In the past, we've worked around this by manually instructing ClickHouse using the
force_restore_data
sentinel file, but this requires manual detection and intervention each time a table starts up in read-only mode. This patch sets thereplicated_max_ratio_of_wrong_parts
flag to 1.0 so that ClickHouse always accepts local state, and never starts tables in read-only mode.As described in ClickHouse/ClickHouse#66527, this appears to be a bug, or at least an ergonomic flaw, in ClickHouse. One replica of a table can routinely fall behind the others, e.g. due to restart or network partition, and shouldn't require manual intervention to start back up.
Part of #8595.
Note: I'm not sure now best to test this. It sounds like we have reasonably high confidence that the fix will work, so we could just merge and deploy to dogfood, and revert if necessary. Or is clickhouse cluster running on another rack that we can test?